VC-Dimensions of Random Function Classes
نویسندگان
چکیده
For any class of binary functions on [n] = {1, . . . , n} a classical result by Sauer states a sufficient condition for its VC-dimension to be at least d: its cardinality should be at least O(n). A necessary condition is that its cardinality be at least 2 (which is O(1) with respect to n). How does the size of a ‘typical’ class of VC-dimension d compare to these two extreme thresholds ? To answer this, we consider classes generated randomly by two methods, repeated biased coin flips on the n-dimensional hypercube or uniform sampling over the space of all possible classes of cardinality k on [n]. As it turns out, the typical behavior of such classes is much more similar to the necessary condition; the cardinality k need only be larger than a threshold of 2 for its VC-dimension to be at least d with high probability. If its expected size is greater than a threshold of O(log n) (which is still significantly smaller than the sufficient size of O(n)) then it shatters every set of size d with high probability. The behavior in the neighborhood of these thresholds is described by the asymptotic probability distribution of the VC-dimension and of the largest d such that all sets of size d are shattered.
منابع مشابه
Error Bounds for Real Function Classes Based on Discretized Vapnik-Chervonenkis Dimensions
The Vapnik-Chervonenkis (VC) dimension plays an important role in statistical learning theory. In this paper, we propose the discretized VC dimension obtained by discretizing the range of a real function class. Then, we point out that Sauer’s Lemma is valid for the discretized VC dimension. We group the real function classes having the infinite VC dimension into four categories by using the dis...
متن کاملEntropy, Combinatorial Dimensions and Random Averages
In this article we introduce a new combinatorial parameter which generalizes the VC dimension and the fat-shattering dimension, and extends beyond the function-class setup. Using this parameter we establish entropy bounds for subsets of the n-dimensional unit cube, and in particular, we present new bounds on the empirical covering numbers and gaussian averages associated with classes of functio...
متن کاملGeneralization Bounds and Complexities Based on Sparsity and Clustering for Convex Combinations of Functions from Random Classes
A unified approach is taken for deriving new generalization data dependent bounds for several classes of algorithms explored in the existing literature by different approaches. This unified approach is based on an extension of Vapnik’s inequality for VC classes of sets to random classes of sets that is, classes depending on the random data, invariant under permutation of the data and possessing...
متن کاملBounding Embeddings of VC Classes into Maximum Classes
One of the earliest conjectures in computational learning theory—the Sample Compression conjecture—asserts that concept classes (equivalently set systems) admit compression schemes of size linear in their VC dimension. To-date this statement is known to be true for maximum classes—those that possess maximum cardinality for their VC dimension. The most promising approach to positively resolving ...
متن کاملA note on bounds for VC dimensions.
We provide bounds for the VC dimension of class of sets formed by unions, intersections, and products of VC classes of sets 𝒞(1),…,𝒞(m.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Discrete Mathematics & Theoretical Computer Science
دوره 10 شماره
صفحات -
تاریخ انتشار 2008